Programming Massively Parallel Processors by David B. Kirk & Wen-mei W. Hwu
Author:David B. Kirk & Wen-mei W. Hwu
Language: eng
Format: epub
ISBN: 9780123914187
Publisher: Elsevier Inc.
Published: 2012-12-05T16:00:00+00:00
Figure 10.14 A sequential loop that implements SpMV/COO.
The loop is extremely simple. It iterates through all the data elements and performs the multiply and accumulate operations on the appropriate x and y elements using the accompanying col_index and row_index elements. We will not present a parallel SpMV/COO kernel. It can be easily constructed using each thread to process a portion of the data elements and use an atomic operation to accumulate the results into y elements. This is because the threads are no longer mapped to a particular row. In fact, many rows will likely be missing from the COO representation; only the rows that have an exceedingly large number of nonzero elements will have elements in the COO representation. Therefore, it is better just to have each thread to take a portion of the data element and use an atomic operation to make sure that none of the threads will trample the contribution of other threads.
The hybrid SpMV/ELL-COO method is a good illustration of productive use of both CPUs and GPUs in a heterogeneous computing system. The CPU can perform SpMV/COO fast using its large cache memory. The GPU can perform SpMV/ELL fast using its coalesced memory accesses and large number of hardware execution units. The removal of some elements from the ELL format is a form of regularization technique: it reduces the disparity between long and short rows and makes the workload of all threads more uniform. Such improved uniformity results in benefits such as less control divergence in a SpMV/CSR kernel or less padding in a SpMV/ELL kernel.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Building Low Latency Applications with C++ by Sourav Ghosh(3265)
Fusion 360 for Makers by Lydia Sloan Cline(1995)
Networking A Beginner's Guide by Bruce Hallberg(1945)
But How Do It Know? by J. Clark Scott(1830)
Computers For Seniors For Dummies by Nancy C. Muir(1780)
Hands-On Linux for Architects by Denis Salamanca(1769)
Arduino Project Handbook, Volume 2: 25 Simple Electronics Projects for Beginners by Geddes Mark(1762)
Hack and HHVM by Owen Yamauchi(1682)
31 Days Before Your CompTIA A+ Exams (Shanette Luellen's Library) by Benjamin Patrick Conry(1671)
9781803246888-ENHANCING DEEP LEARNING WITH BAYESIAN INFERENCE by Unknown(1542)
MicroPython Projects by Jacob Beningo(1501)
Embedded Programming with Modern C++ Cookbook by Igor Viarheichyk(1488)
PrestaShop Recipes by Arnaldo Pérez Castaño(1476)
Hands-On Internet of Things with MQTT by Tim Pulver(1447)
Implementing Cellular IoT Solutions for Digital Transformation by Dennis McCain(1434)
Embedded Systems Architecture by Daniele Lacamera(1391)
Raspberry Pi Electronics Projects for the Evil Genius (Tab) by Norris Donald & Norris Donald(1389)
Getting Started with Soldering: A Hands-On Guide to Making Electrical and Mechanical Connections by Vinck Marc de(1387)
Mastering Kubernetes by Gigi Sayfan(1385)